Integrative machine learning approach for multi-class SCOP protein fold classification

نویسندگان

  • Aik Choon Tan
  • David R. Gilbert
  • Yves Deville
چکیده

Classification and prediction of protein structure has been a central research theme in structural bioinformatics. Due to the imbalanced distribution of proteins over multi SCOP classification, most discriminative machine learning suffers the well-known ‘False Positives’ problem when learning over these types of problems. We have devised eKISS, an ensemble machine learning specifically designed to increase the coverage of positive examples when learning under multiclass imbalanced data sets. We have applied eKISS to classify 25 SCOP folds and show that our learning system improved over classical learning methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-class protein fold recognition using support vector machines and neural networks

MOTIVATION Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system. RESULTS Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known '...

متن کامل

Multi-class Protein Fold Recognition Through a Symbolic-Statistical Framework

Protein fold recognition is an important problem in molecular biology. Machine learning symbolic approaches have been applied to automatically discover local structural signatures and relate these to the concept of fold in SCOP. However, most of these methods cannot handle uncertainty being therefore not able to solve multiple prediction problems. In this paper we present an application of the ...

متن کامل

Multi-class protein fold classification using a new ensemble machine learning approach.

Protein structure classification represents an important process in understanding the associations between sequence and structure as well as possible functional and evolutionary relationships. Recent structural genomics initiatives and other high-throughput experiments have populated the biological databases at a rapid pace. The amount of structural data has made traditional methods such as man...

متن کامل

A novel ensemble of classifiers for protein fold recognition

Predicting the three-dimensional structure of a protein from its amino acid sequence is an important problem in bioinformatics and a challenging task for machine-learning algorithms. We propose a new ensemble of K-local hyperplane based on random subspace and feature selection, and tested it on a real-world dataset containing 27 SCOP folds from [C. Ding, I. Dubchak, Multi-class protein fold rec...

متن کامل

Protein Fold Classification using Kohonen's Self-Organizing Map

Protein fold classification is an important problem in bioinformatics and a challenging task for machine-learning algorithms. In this paper we present a solution which classifies protein folds using Kohonen’s Self-Organizing Map (SOM) and a comparison between few approaches for protein fold classification. We use SOM, Fisher Linear Discriminant Analysis (FLD), K-Nearest Neighbour (KNN), Support...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003